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Supplementary appendix 

This part of the paper may not necessarily be essential for understanding the main points of 
the paper. Nevertheless, it may be crucial for those readers who intend to improve, extend or 
adapt our theoretical formulation to whatever suits their needs. It may also entertain those 
readers who are fond of mathematics. The sections below were all adapted from the relevant 
(sub-)sections of our unpublished paper deposited in bioRxiv [32] . 

SA-1. Equivalence of our bra-ket formulation to standard vector-matrix formulation of 
continuous-time Markov model 

[This section was adapted from subsection 1.1 of [32]. See subsection 1.2 of ibid, for a simple 
example. And see subsection 1.3 of ibid, for a brief discussion on the differences from the use 
of bra-ket notation in quantum mechanics.] 

Let us first recall the conventional formulation of a general continuous-time Markov 
model on a finite space consisting of N states, i = 1,2,..., N . One way of formulating the 
model is to specify a rate matrix, Q = (q^ ). Let q t] denote the (/', /) -element of Q , i.e., its 
element at the intersection of the i th row and the j th column. Then, the non-diagonal 
element q if (i* j) of a rate matrix Q is the rate (per certain unit time) at which the system 
moves to the j th state, given it was in the i th state immediately before the time in 
question. The diagonal element, q u , is usually given by the equation: 

«»—!>. ««. — Eq.(SA-l.l) 

0*0 

to guarantee that the summation of the probabilities over the states remain 1 (unity) all the 
time. Now, let the probability vector, p(t) = (p,(0), be a row vector whose i th element, 
Pi(t) , is the probability that the system is in the i th state at time t . Then, under the above 
Markov model, pit) satisfies the 1st order time differential equation: 

pP(t) = p(t)Q (or ^-Pi(t) = y N pjit)q fi ). —Eq.(SA-1.2) 

The general solution of this equation at a finite time t{> 0) is given by: 

p(t) = p(0)P(t) (or p i (t) = '2, N j _ i Pj(0)p ji (t)) . — Eq.(SA-1.3a) 

Here the finite-time transition matrix, P(t) = I /; /7 (/)j = exp(f<2), is an N xN matrix whose 

(i,j) -element p tJ it) is the probability that the system is in the j th state at time t , 
conditioned on that it was in the i th state initially (i.e., at time / = ()): 
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Pijit) = [exp {tQ)\j = P[(J,t) |0',0)] . 


— Eq.(SA-1.3b) 


N 

If Eq.(SA-l.l) holds, the matrix elements satisfy > pM) = 1 (unity) for all i = 1,2,..., N 

Meanwhile, p(0) = (p,(0)) is the initial probability vector, whose i th component, /;,(()), 
is the probability that the system was in the i th state at time t = 0 . They satisfy 

^ { Pj( 0) = 1 • This could be made more explicit by using the basic row vectors, 2 N . 

Here, e tt = (0,...,1,...,0) is the row vector with all zero except the i th component, which is 
1 (unity). This “basic vector” represents the situation where the system is in the i th state. 
Using these basic vectors, the initial probability vector is expressed as: 

m-2 N M Pt°K • — Eq.(SA-1.4) 

The expression is interpreted as the initial condition that the system is in the i th state with 
probability p,(0) (i = 1,2,..., N). Similarly, the probability vector at any time could be 

expressed as: 

p(t) = ^ =[ p i (t)e i . — Eq.(SA-1.5) 

It is interpreted as the situation where the system is in the i th state with probability p i (/) 
(i = 1,2,..., N) given by Eq.(SA-1.3a). Using the basic vectors, the conditional probabilities 
can be formally extracted from the finite-time transition matrix, P(t) = cxp(/0), by a matrix 
multiplication: 

P[{j,t) |0',0)] = p..(t) = e t P{t)(ej . - Eq.(SA-1.6) 

Here, (e l )' is the column vector obtained from the row vector, e t , by a matrix transposition 
operation ( i.e., by interchanging the rows with the columns). 

Now we can introduce the bra-ket notation and operators. First, we replace each 
basic row vector, e i , with the corresponding basic bra-vector, (i |, and replace each basic 
column vector, (if / , with the corresponding basic ket-vector, \j} . Then, the bra-vector 
corresponding to the probability vector p(t) = (/;,(/)) in Eq.(SA-1.5) is given by the 
following linear combination of the basic bra-vectors: 

= • -Eq.(SA-1.5’) 

In the present formulation, the exclusive role of a ket-vector is that it serves as an “acceptor” 
of bra-vectors. More specifically, we will make the ket-vector, |y^>, accept only the 
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corresponding bra-vector, (j |, by defining the scalar products: 

(i| j) = 1 if i = j, = 0 (/ i*j. — Eq.(SA-1.7) 

Using these scalar products, we get, e.g., the equation, (j?(0|*) = pft), from Eq.(SA-1.5’). 
Next, we introduce (linear) operators that transform each bra-vector into a specified linear 
combination of bra-vectors. The operators are analogs of matrices in the traditional 
formulation. For example, we could define an operator, m(i —* j ), that transforms (or 
“mutates”) the i th state to the j th state, but does nothing else: 


Eq.(SA-l .8) 


(i | m(i —> j) = (j |, 

(k | m(i -»> j) = 0 for k*i. 

This operator corresponds to the matrix whose elements are all zero except the (i,j) -element, 
which is 1 (unity). Now, we define the (instantaneous transition) rate operator, Q , as follows: 




— Eq.(SA-1.9) 


Then, we get the following equation: 


Then, substituting Eq.(SA-l .2) for the expression in braces on the leftmost hand side, we 
have: 

This means that we can recast the defining equation, Eq.(SA-l .2), of the continuous-time 
Markov model into the equation satisfied by the probability bra-vector (fp(t )|: 

j t (p(t)\ = {p(t)\Q . — Eq.(SA-1.2’) 

This equation can be integrated as: 


{p(t)\ = {p(0)\P(t) 


— Eq.(SA-1.3a’) 


with the finite-time transition operator, P(t) = exp (tQ) . And the counterpart of Eq.(SA-l ,3b) 
is: 


</|P(O|j> = (/|ex P a0|j> = P[(j,O|(bO)] . —- Eq.(SA-1.3b’) 


Solving Eq.(SA-l .2’) for every possible initial probability bra-vector, (p(0)| = ^ (| P,(0)^/|, 
is equivalent to solving the following equation for the operator P(t ): 
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fl 'V /V /V 

-Pit) = Pit) Q 
dt 


— Eq.(SA-l.lOa) 


with the initial condition, 

P(0) = I, — Eq.(SA- 1.10b) 

where I is the identify operator: (/|/=(/| for every state i . Thus, if desired, 

Eqs.(SA-1.10a,b) could be considered as the defining equation of the continuous-time 
Markov model. 

Thus far, we tacitly assumed that the Markov model is time-homogeneous, where 
the rate matrix Q , or the rate operator Q , is independent of time / . In reality, the transition 
rate, q tj , could depend on time due to ,e.g., the temporal change of the environment the 

system is in. Here, we extend the formulation developed above to the system with a 
time-dependent rate matrix, Q(t ) = (^-(f)), whose operator counterpart is denoted as Q(t ). 


Because the model is no longer homogeneous in time, when we consider a finite-time 
evolution of the system, we need to specify the initial time tj , in addition to the final time 
t F (>/,). Let P(tj, t F ) be the operator describing the finite-time transition during the closed 
time interval, [t /5 t F ] , that is: 

(i\P(t I ,t F )\j) = P[(j,t F )\U,t I )] for v /,jG{l,2,...,A}, t F >tj, 


under a time-heterogeneous continuous-time Markov model with the rate operator Q(t). 
Then, the defining equations, Eqs.(SA-1.10a,b), are extended to fit this model as: 


^P(t I ,t) = P(t I ,t)Q(t) , — Eq.(SA-l.lOa’) 

dt 

P(t,t) = I for y t . — Eq.(SA-l.lOb’) 

The general solution of the above equations is symbolically given by: 


Pit n t) = T exp lf'dt'Qitj\\ . 


— Eq.(SA-l.ll) 


Here Tj...} denotes (the summation of) the time-ordered product(s), which arrange(s) 
multiplied operators in the temporal order so that the earliest operator will come leftmost. For 
example, 


T{A( tl )B(t 2 )} 


A(q)B(t 2 ) for t x <t 2 , 

< 

B(t 2 ) A(?j) for t 2 <t x . 


We could regard the time-ordered exponential in Eq.(SA-l.ll) as defined by a limit: 
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where t) L] = t, + (i - j)^f-, or as defined by a series: 

eXP (f< dT 2( T ))l = i + 2/ dT l 'f dr nQ( r i)--Q(' t n) 

n= 1 t I <T l <...<r n <t 

= /+ f'dT 1 2(r,)+ f'jT, f dr, Q(r x )Q(r,) + f'jr, f dr, f' dr 3 2 (t,)2(t,)2(t 3 ) +... 

Moreover, the finite-time transition operator given by Eq.(SA-l.ll) also satisfies the 
“backward equation”: 

4- ht,t F ) = -Q(t)P(t,t F ) , — Eq.(SA-1.12) 

at 

as well as the Chapman-Kolmogorov equation (aka the multiplicativity condition): 

P(t I ,t F ) = P(t l ,t M )P(t u ,t F ) (t, < t M < t F ) . — Eq.(SA-1.13) 

The latter could be rewritten in terms of conditional probabilities: 

P[(j,t F ) 1(1,0] = ^P[(k,t M ) \(i,t r j] P[(j,t F ) | (k,t M )\. - Eq.(SA-1.13’) 

k =1 

The last equation can be obtained by sandwiching the both sides of Eq.(SA-l.13) with (i\ 

and \j), and by inserting the decomposition of the identity operator, / = ^ | k^(k |, 

between the two finite-time transition operators on its right-hand side. 

As described above, we have reformulated a continuous-time Markov model on a 
finite set of states in terms of bra-vectors, ket-vectors and operators. Once we formulated it 
this way, we could extend the formulation to continuous-time Markov models defined on any 
discrete set of states, irrespective of whether it is finite, countably infinite, or uncountable, as 
long as the state space and the elementary transitions within it are well-defined. This notation 
facilitated the development of our ab initio theoretical formulation of the general 
continuous-time Markov model describing the evolution of an entire sequence along a 
time-axis via insertions/deletions (and substitutions if desired). 

SA-2. Factorability of pairwise alignment probability: mathematically rigorous proof 

This section presents probably the mathematically toughest among the proofs/derivations 
given in this paper. Nevertheless, if you keep in mind the intuitive pictures acquired in the 
main text (and in Additional file 1), they may not be so formidable as they appear at first 
glance. 



T lexp 
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SA-2.1. Outline 

[This subsection was adapted from a part of subsection 4.1 of [32].] 

Here, we give a mathematically rigorous proof of the factorability of the probability of a 
given pairwise alignment (PWA) under the conditions (i) and (ii) given in section R6 of the 
main Results and discussion (and also in SM-2 in Additional file 1). Actually, in SM-2 in 
Additional file 1, we already gave a proof that is nearly rigorous; the only loose point there 
was the proof of the ansatz: 


[( 

M 

? J | (^ ’ ^/) 

L\ 

- 

LHS / 


M[k,N k ]\, [t n t F ]) | (V 

k =1 L 



- Eq.(SM-2.8) 

among the probability quotients, Eqs.(SM-2.4-6) (see also Eq.(SM-2.7)). Thus, we here focus 
on rigorously proving Eq.(SM-2.8). As in SM-2, we consider a local history set (LHS) 


denoted as: M = 


M[k,l],...,M[k,N k ]\ 


k =I. K 


First we recall that the left-hand side of Eq.(SM-2.8) is rewritten as: 




M 


,\t I ,t F ]\(s A ,t I ) 


1LHS 


[ip 


[M 1 ,M 2 ,--,M JV ]eM 


([M l ,M 2 ,--,M N ],[t I ,t F ])\(s A ,t I ) 


S /•••/ dr 1 ■dr N 

[ N ] 

[ M, , M 2 , • ■ ■, M n M t I =r Q <r \ < "' <r N <r N,\ =t F 

L J LHS 

x exp j ” +1 dr 5R(s v , r) l 


l v-0 J 


J LHS 

rN 


ll/ ( ^ ;S v-l’ T v) 


s 0 =s , 

for v=l. N 


— Eq.(SA-2.1.1) 

(It corresponds to Eq.(SM-2.9).) Here, SRf(s, s', t ) = Rf{s, t)-Rx{s', t) denotes an 
increment of the exit rate. We also recall that the right-hand side is rewritten as: 


f\[i P [([M[ML M[k,N k ]\ [t n t F ]) | (/, t,) 

k =1 L 


A 

-n 


k =1 


f' ’ 'f dr(k,l) dr(k,N k ) (n N "_r^M[k,i k \, s^TiKi^ 

tj=T(k,0)<T(k,l)< • • • <r(k,N k )<r{k,N k +\)=t F 


xexp 


_y r^ +X) dr 8 R ID^ A j 

ZjJr(k,i k ) X 


4=0 


(so|=(s A |. 
for i. I.V. 


— Eq.(SA-2.1.2) 

(It corresponds to Eq.(SM-2.10).) Here, r (k, i k ) denotes the time at which the event 
M[k,i k ] virtually occurred in the isolated k th local history. Second, we note that the 
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/v ? 

subject LHS equivalence class, M , consists of -—— global indel histories. Each 

i Ls n,-, w * ! 

history corresponds to a map from each event in each local indel history (specified by k) to a 
temporal order within the global history: 

it : (k, 4) (k = 1, K ; 4 = 1, N k ) l—> v (= 1, N ). 

The map keeps the relative temporal order among indels in each local indel history. Then, 

[Mj, M 2 , •••, M n ] in the middle and on the rightmost side of Eq.(SA-2.1.1) corresponding to 

the above Jt can be more precisely written as: M'[^r _1 (l)], •••, M'[^r _1 (A)]j. 

Here is an equivalent of M[j r“‘(v)] (= M[k, i k ] for 3 (k, i k )) through a series 

of binary equivalence relations, Eqs.(R5.2a-d), needed for rearranging the events in M this 


way. Now, let n 



be the set of such 



maps. Then, the middle and the 



— Eq.(SA-2.1.1’) 

Comparing Eq.(SA-2.1.1’) with Eq.(SA-2.1.2), we can see that the ansatz, Eq.(SM-2.8), 
should hold if and nearly only if the following two equations are satisfied. 

(A) The equation between the multiple-time integration operations: 
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f‘“f \Y[dt(k,\)~dt(k,N k ) 

tj<r(jT~ l (l))<---<T(ji;~\N))<t F \ k =1 

K ( \ 

- n /•••/ dr(k,l) dr{k,N k ) F k (t(&,1),..., r(k,N k )) 

— Eq.(SA-2.1.3) 

for any set of non-singular functions, F k (r(/r,l),..., r(k,N k )} | k = 1,A'j . Because the both 



/ k-\ 


sides share the same integrands and the dummy time variables for integration, it is tantamount 
to the equation between the domains of integration. 

(B) The equation between the integrands (/.<?., the probability densities): 


K N k 

\ \Y\ r (M'[k,i k ];s(jz(k,i k )-l),r(k,i k 


k= 1 4=1 


>) 


/ 


(s(0)[={/|, 

j(s(v)|=^s(v-l)|M'[jr _1 (v)] | v=l -Vj 


x ex P \-'ZC,Z? dz SR '° « v >' T >| 


v=0 


(s(0)\=(s A \, 

|^(v)|=^(v-l)|M'[^r _1 (v)] | v=l,...,Afj 


i\ 

n 

k =1 


iV it 


\ 


( S o|=(^ |- 


xexp- 

K 

-2 

dr 6R , ^ > (s i , s A , r) 

z*iJnk,i k ) x v 4’ 

(j 0 =(./, 


k= 1 

4=0 

{(%K s *-iI* tM ‘ ] i 4-1 .'M 


— Eq.(SA-2.1.4) 


for every map re G II 



and its associated temporal order of events. 


Eq.(SA-2.1.3) is an identity that is intuitively plausible. However, its rigorous proof 
is not so straightforward, and will be given in section SA-2.2. In contrast, Eq.(SA-2.1.4) holds 
only under an appropriate set of conditions on the indel rate parameters. Here, let us delve 
further into this equation. 

The both sides of Eq.(SA-2.1.4) exhibit very similar forms. Each of them is a 
product of the rates of indels that actually occurred or their equivalents, multiplied by an 
exponential. And the exponent is the summation of time-integrated increments, of the exit 
rates of the states that the sequence actually (or virtually) went through, compared to the exit 
rate of the ancestral state. Thus, aside from miraculous, exceptional cases, it would be natural 
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to expect the equations to be satisfied for each of the factors. This reasoning gives two types 
of equations. One is a set of equations for the factors in the product, 

r^M'[k,i k \; s(jt(k,i k )-l), r(k,i k )) = r^M[k, i k ]; 5 . _ p r(k, 4)) — Eq.(SA-2.1.4’a) 


for v £ = v 4 = 1,..., N k , and Vem 


exponent, 


M 


. And the other is an equation for the 


J LHS / 


2/ 


t(ji (v+1)) 

T(3T _1 (V)) 


dr dR™{s{v), s A , r) 


i\ 

k =1 


Y f 1 ‘ dr (s i , s A , r) 


(*( 0 >|=(/|, 

{(s(v)|-(s(v-l)|M’[jr _1 (v)] | v=l. ivj 


( s o \=(s A |. 

4 H .^4 


Eq.(SA-2.1,4’b) 


for ^rGII 


M 

| 

. 

LHS ) 


.Here, we set t(^ _ 1 (0)) = 4 and r(jr~\N+ l)) = t F .In 

Eq.(SA-2.1.4’a), s(jt(kJ l( )- 1) is the sequence state immediately before M'[k,i k ] in the 

global indel history, and s t 1 is the state immediately before M[k, i k ] in the isolated k th 

local indel history. The only difference between both sides of Eq.(SA-2.1.4’a) is in the states. 
In general, s(jt(k,i k )~ 1) on the left hand side resulted from some of the events in the other 


local indel histories, on top of M[k, j] with / < i, . In contrast, y on the right hand side 


will never be impacted by the other local histories. Thus, Eq.(SA-2.1.4’a) simply states, for 
the PWA probability to be factorized, “the rate parameter for each indel operator in each local 
indel history must never be influenced by the actions of any indels that occurred before 
r(k,i k ) and that belong to any other local histories .” 

Meanwhile, Eq.(SA-2.1,4’b) appear more formidable than Eq.(SA-2.1.4’a). 
Nevertheless, we can prove the following proposition. 

[Proposition SA-2.1.1] 

“Let (s-[k, i k ]\ = (s\M'[k, i k ] and ($•[&', i k ,]\ = (s\M"[k',i k ,] (with k, k'(* k) G {1,...,Z}) be 

the states resulting from the actions of the equivalents of events M[k, i k ] and M[k', i k ,], 
respectively, on s G S n . And let 

(s-[k, i k ][k', 4']| = (s\M'[k, i k \M'[k', i k ] = (s\M"[k', i k .]M”[k, i k ] be the state resulting from the 
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consecutive actions of the equivalents of M[k, i k ] and M[k', i k ,] 
exponents, Eq.(SA-2.1.4’b), holds for every global history jt E II 


on 

(\m 


s . The equation for the 
and for each of 


\L J LHS / 

its sub-histories that could occur in any sub-interval, [t n t] with t E [t n t F ], if and only if 
the equation, 

R*x(s, t ) + Rf (s-\k,i k ][k',i k .], t) = R^(s-[k,i k ], t ) + Rf (s-[k',i k .], t ), — Eq.(SA-2.1.5) 


holds for every pair, M[k, i k ] and M[k', i k ,] (with k * k! ), in the LHS M , for every 


II /V /V 

possible state sGS before both equivalents of M[k,i k ] and of M[k , i k .] in the global 


histories in n 


M 


and at any time tE.[t I ,t F ’\. r 


J LHS / 


The detailed proof of this proposition is given in subsection SA-2.3. In the proposition, the 
applicable scope of Eq.(SA-2.1.4’b) was extended to all sub-histories of global histories 


belonging to n 



and to any sub-interval, [t 7 ,t],of \l n t k \. This extension would 


be acceptable in practical analyses, where what we actually want is to factorize all alignment 
probabilities during any time interval. We can clarify the meaning of Eq.(SA-2.1.5) by 
rewriting it as follows: 

dR^(s-[k,i k ][k’,i k ,],s-[k’,i k/ ], t ) = SR r x D (s-[k,i k \, s, t ), — Eq.(SA-2.1.5’) 

6R?(s ■ [k,i k ][k',i k .], s ■ [k,i k ], t ) = 6R'‘\s ■ [k',i k .], s, t). — Eq.(SA-2.1.5”) 

These equations mean that the increment of the exit rate due to an event in a local indel 
history must be independent of the effect of any event in any other local indel history. 

To summarize, we derived a sufficient and nearly necessary set of conditions, 
Eq.(SA-2.1.4’a) and Eq.(SA-2.1.5), under which the integrand of the probability of an indel 
history can be factorized, as in Eq.(SA-2.1.4). To clarify what these conditions mean, we here 
rephrase them in words. First, Eq.(SA-2.1.4’a) can be rephrased as follows. 


Condition (i): ‘ ‘The rate parameter, r^M'[k,i k \; s', r(k,i k )^, for each actually occurred indel 

event (M '[k, i k ]) will not be influenced by the action of any indel events outside of the k th 
local history before r(k, i k ) 

Second, we can rephrase Eq.(SA-2.1.5) as follows. 


Condition (ii): “Let (s(v)| = <^V 4 1 (1)] • • ‘(v)] be the state resulting from the 


actions of events up to (and including) the v th event in a global history corresponding to a 
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map tc G II 


M 


and let M'[jv '(v)] = M'[k(v), i k(v) (v)] be the v th event in the 


\L J LHS / 

global history. Then, the increment of the exit rate, 8Rf (,v(v), s(v -1), t) , due to the event 
M\jt~\v)\ = M'[k(v), 4 (v) (v)], will not be influenced by the actions of any indel events 


^ —1 

outside of the k(v) th local history before M'[jt“ (v)].” 

If this set of conditions is satisfied for all global indel histories in a LHS equivalence class 


M 


, then, Eq.(SA-2.1.4) holds for all integrands. This, combined with the identity on the 


LHS 


domains of integration, Eq.(SA-2.1.3), makes the total probability of 



factorable, thus 


proving the ansatz, Eq.(SM-2.8). [NOTE: Someone might guess that the condition (ii) should 
follow from the condition (i) almost trivially. We will see that this naive guess is wrong in 
subsection R8-2 of the main Results and Discussion.] 


SA-2.2. Proof of factorization of multiple-time integration, Eq.(SA-2.1.3) 

[This subsection was adapted from section A4 of Appendix of [32] .] 

The identity, Eq.(SA-2.1.3): 

f“‘f j]^[rfr(k,l) -dr(k,N k )\ Y[F k (r(k,l),...,r(k,N k 

LHS I 

K ( \ 

- n /•••/ dr(k,l) dr{k,N k ) F k (r(k,l),..., r (k,N k )) , 

k =1 \tj<T(k,l)<---<r(k,N k )<t F y 

— Eq.(SA-2.2.1) 

where j/y (T(k,l),..., r(k,N k fj | k = 1,..., A'j is any set of non-singular functions of 



multiple time points, is one of the two essential elements for obtaining our sufficient and 
nearly necessary set of conditions for a factorable PWA probability. The identity states that, if 
we sum the multiple-time integration operations for global indel histories over a LHS 
equivalence class, it can be factorized into the product of multiple-time integration operations, 
each for a local indel history, over the LHS. Here, we prove this identity in a mathematically 
rigorous manner. 


Let us remember here that n 


M 


LHS / 


denotes the set of maps that correspond to 


global indel histories in a LHS equivalence class. Each of its elements is expressed as: 
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n : (k, 4) (k = 1, K ; i k = 1,.... iV t ) ^ v (= 1,JV). 

Then, we first note that, because the integrands and the sets of variables of integration are 
identical on both sides of Eq.(SA-2.2.1), proving this identity is equivalent to proving the 
equation (modulo differences of measure zero ) between the domains of integration: 


|J D (N) 

jzElTI M 

\L J LHS) 


Jt 


M ;[tj. 


t F ] 


= XD 

k =1 




M[k];[t n t F ] 


— Eq.(SA-2.2.2) 


Here, O' Nk 1 


M[k];[tj,t p ] 


is the domain of integration for the k th local indel history, 


M[k] = [M[k,l],...,M[k,N k ]\: 


D 


wo 


M[k];[tj,t p ] 


= {{r(k ,\),..., r(k,N k )) 1 1 , < r(k, 1) < < r (k,N k )< . 


— Eq.(SA-2.2.3a) 


And D 


(iV) 


jt M ; [tj, t F ] 


is the domain of integration for the global indel history, Jt M 


D 


(AO 


Jt M ; [t n t F ] 


..., t(K,N k )j 


IV 


t, < r(jt 1 (1)) < • • • < r(jt 1 ( N )) < t 


-i/ 


— Eq.(SA-2.2.3b) 

To go further, let us introduce a new notation, n' A ’[iV,, ..., N K ], that represents the 


set n 


M 


. The new notation reminds us that each of the 


N\ 


LHSI 


n.A ! 


elements of 


n 


M 


can be re-interpreted as a rearrangement of K sets, whose sizes are N l ,...,N i 


K ’ 


J LHS J 


into a single set of size N = , N k . Then, each map n K) E n| 


M 

) 

. 

LHS / 


= n ,K) [n„...,n k ] 


can be re-expressed as a composite map, Jt (K> = p ° [( Jt (K ' _l) , / v )j. Here 

, N k _ j ] is a rearrangement of K - 1 of the original A' sets excluding 
the ^ th set, is the identity map from the N k elements in the K th set to 

themselves, and p G n l2) [iV - N K , N K ] is a rearrangement of the K th set and the 
remainder made from the K - 1 sets. The numbers of the elements exactly match, because 
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we have 


N\ 


N\ 


(N -N k )\ 


. Provided that the binary ( i.e., K = 2 ) 


IL*.' <A, - iV ‘ )!iV ‘ ! IL>' 

version of Eq.(SA-2.2.2) is proved, then we can apply them for each fixed 


r (K- 1 ) 


"til 1 [N l ,...,N K _ l ] and all p G IT ’[N-N K , N K ], and we can factor out the 
contribution from the K th local (or “separated”) indel history. 

This is formally proved as follows. First, the left-hand side of Eq.(SA-2.2.2) is 
re-expressed as: 


U D 

*en <x) [v 1 . n k ] 


( n) 


it M ;[ tj,t F ] 


U U 

n' e n‘ ^ K ~' 1 [ v, n K _\ ] [ pen 12 '[ iv-« s , iv x ] 


po[{n\I NK )][M\At n tA 


— Eq.(SA-2.2.4) 

On the right-hand side, we 


have po[(jr',I NK )] M \ = p 


7t 



M[K] 


by definition. Here, 


M' = 


M\ 1], 


M[K- 1] 


is the “reduced” LHS consisting of K -1 out of the original K 


local indel histories in M , excluding the K th local history, M[K]. Substituting this into 
Eq.(SA-2.2.4), and assuming that Eq.(SA-2.2.2) holds with K = 2 , we have: 


U D 

[N,.. 

U 


/J 


,(N-N t ) 


.jv x _,] 


U D 


M I, \_tj, t F ] 


x D 




(N-N k ) 




x I) 


,(Aft) 


M[k\;[t,,t F ] 
M[k \; [t,,t F 


— Eq.(SA-2.2.5) 


This series of equations re-expresses the above verbal reasoning in clear mathematical terms, 
and formally demonstrates that the domain of integration for the rightmost local indel history 
(i.e., the K th local history) is indeed factored out. Iteratively applying the above reasoning 
to the remaining set of K -1 local indel histories, we can prove that the domains of 
integration for all local indel histories can be factored out. This finally gives Eq.(SA-2.2.2) 
and thus proves the identity, Eq.(SA-2.2.1), i.e., Eq.(SA-2.1.3). Thus, the problem at hand 
was reduced to proving Eq.(SA-2.2.2) with K = 2 , which we will call the “binary domain 
identity” here. It is rewritten as: 
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u 


D 


(N,+N 2 ) 


pen < 2 ) [w,,iv 2 ] 


M[1],M[2] 


\\t„ 


t F ] 


= D 


(JVi) 


M[l];[t n t F ] 


x D 


(N 2 ) 


M[2]-,\t n 


t F \ 


— Eq.(SA-2.2.6) 

Using Eqs.(SA-2.2.3a,b), and setting t. = t(1,z) and r' = r(2,i ), it can be rewritten further 
as: 

U {(t v ...,t n -t' v ...,t' N 2 )\ t, <T(p-‘(i))< < r(p-\N l + A 2 )) < 

pen < 2 ) [AT,,iv 2 ] 

= | <Tj <"-<T JVi <t F }x | tj <x[ <---<t^ 2 <t F } . 

— Eq.(SA-2.2.6’) 

(It should be noted that, in this subsection, the identities between the domains are considered 
modulo differences of measure zero.) 

We will prove this identity, Eq.(SA-2.2.6’), via mathematical induction regarding 
N 2 . First, we show Eq.(SA-2.2.6’) with A 2 = 1 holds for every fixed positive integer A,. In 
this case, n (2) [Aj, N 2 = 1] consists of A, +1 elements, each of which inserts the event in 
the 2nd local history between the i th and i +1 th events in the 1st local history 
(i = l,...,Aj -1), or places it before or after all events in the 1st local history. Thus, we have: 

U ,')| t, < r(p“'(1)) < • • • < r(p-'(A, +1)) < t F } 

pen'Uiv,.!] 

N i 

= U{( T ,,-, T 'V,;<) \t I < T l < --- < T Ni < t F ,T i < r[ < T,. +1 } 

1=0 

Nl 

(Tj,..., t Ni ; t[) 1 1 , < r x < ■ ■ • < t Ni < t F , (J {r, < r[ < r M } 

i=0 

= |(T 1 ,...,T iVi ; Tj) | tj < Tj < • • • < X Ni <t F , tj <T\ <t F | 

= |(T 1 ,...,T iVi ) | tj < Tj < • • • < T jV| < A| X ) | t/ < U < tp} . 

Here we set t 0 = t 2 and x N +l = t F . This shows that Eq.(SA-2.2.6’) with A 2 = 1 holds for 
every A, (E N, (N, is the set of positive integers). 

Next, let us assume that the binary domain identity, Eq.(SA-2.2.6’), holds for 
A 2 = A and for every A, EN, , and see if the identity also holds for A 2 = A +1. For this 
purpose, we classify p G n (2) [A 1? A + l] according to the position of x' N+l relative to 
r,,..., x Ni , and let n' (2) [A p A + l; i] (with i = 0,1,..., A,) be the subset of n (2) [Aj,A + l] 
whose elements satisfy r ; < x' N+x < x M . Here we set x 0 = /, and x N +1 = t F again. For every 



p G n' 17 '[A,, A + l; /], there exist a unique ff£n |,| [i,A] such that: t(p '(v)| = t((T ‘(v) 


for v = l,...,A + /, =x' N+l for v = A + z + l,and =x v _ NA for v = A + z + 2,..., A + A, +1. It 
could also be represented as: 
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(T(p' 1 (l)),...,'r(p“ 1 (W + A^ 1 +l))) = (r(a“ 1 (l)),...,T(cr“ 1 (W + 0),< +1 ,T,. +1 ,...,T iVi ). 

- Eq.(SA-2.2.7) 

Thus, a E U l2) [i, N] corresponds to the sub-history before t' v+i . Taking advantage of these 
facts, the left-hand side of Eq.(SA-2.2.6’) with N 2 = N + 1 is re-expressed as: 

(^J |(Ti,...,Tjv,, | < t(p (1)) < •• • < r(p (A| + N 2 +1)) < t F | 

pen‘ 2 , [v,,iv+i] 


' ’ I 

-u 


1=0 

N, 


-U 


1=0 


y |(Ti,...,T' iVi , 1< t(p (1)) < • • • < t(p (A| + N 2 +1)) < p.j 

pen' <2) [Ar 1 ,w+i;i] 

U {<* 1 E> — Ev + 1 ) 1 1, < r (o~ l (1)) < < T(cr _1 (N + 0) < r' N+1 < r M < ■ ■ ■ < r N> < t F } 


cren (2) [/, N] 


— Eq.(SA-2.2.8) 

Applying the assumed Eq.(SA-2.2.6’) with N 2 =N and A, = z, and with t F replaced by 
t' v+i , to the expression in the square bracket on the rightmost hand side of Eq.(SA-2.2.8), we 
have: 

[J {(t 1 ,...,t JV i ; Tj',...,r^ +1 )| p <r(o~ 1 (l))<---<r(o~ l (N + i))<r' N+1 <r M < — <r Ni < t F } 
r, < r, < • • • < r ( < T-; +1 < r i+1 < • • ■ < T Wi < f F ,1 


cren <2l [/, v] 


- T|,...,T a , +| ) 

= {(Tj,...,T jv r , T 1 ,...,'T A , +1 ) 


tj C C • - C T N < t n+] 


t I <T l <---<r i <T M <---<r Ni <t F , 
t,< r[<---<r' N < t' n+1 , r,. < < +1 < r /+l 


— Eq.(SA-2.2.9) 

Substituting Eq.(SA-2.2.9) back into the rightmost hand side of Eq.(SA-2.2.8), we finally get: 
y {(r 1 ,...,T ]Vi ;r 1 ',...,T^ +1 )| p <r(p“ 1 (l))<---<T(p _1 (A l +A 2 +l))<t F } 


V, 

=u 

f (T 

v *'1V 

E’- 

•’Ev+l) 

v - 

V 

< E < E+l < "' <t N l <t F’ j 

, , , 1 

1=0 

i 



p < T, < 

■ ' < Ev < T iv+i ’ E < Ev+i < E+i J 


(t 1 ,...,t jv , t 1 ,...,t n+1 ) 


\ (E’ — ’Ev, ’ T ivTat + i) 


P < E < • • • < Tjy, < 


p<r; <•••<< <t; +1 , ua 


p < T, < • • • < T ;V| < t F , 


P < E < ‘ < Ev < ^N +1 ’ b < T V+1 < A I 


AZ+1 I E < r jV+l < E+l 
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- |(T 1 ,...,T JVi ; T\,...,T n+1 ) | tj < Tj < • • • < T W| < tj < Tj < • • • < t a , +i < ? F | 

= | tj < T, < ■ ■ • < T Ni < } X \t,<r[<---< t' N+1 < t F } . 

— Eq.(SA-2.2.10) 

This final expression is nothing other than the right-hand side of Eq.(SA-2.2.6’) with 
A 2 = A + 1. Thus, assuming that Eq.(SA-2.2.6’) holds for A 2 = A and for every A, G N,, 
we did indeed show that it also holds for N 2 = A +1 and for every A, G N, . Therefore, the 
binary domain identity, Eq.(SA-2.2.6’), holds for every pair, (A p A 2 ) G N, x N,. This 
completes the proof of our key identity, Eq.(SA-2.2.2), and therefore the proof of the 
factorization of the multiple-time integration, Eq.(SA-2.2.1). 


SA-2.3. Proof of proposition SA-2.1.1 for factorization of exponential factor 
[This subsection was adapted from section A5 of Appendix of [32].] 

The other core element for factorable PWA probabilities is the proposition SA-2.1.1: 

“Let (s-[k, 4]| = (s\M'[k, i k \ and (s-[k', 4 ]| = (s\M"[k\ i k ,] be the states resulting from the 

actions of the equivalents of events M[k, i k ] and M[k\ i k ,], respectively, on ,s - G S .And 

let (s ■ [k, i k ][k', i k ,]\ = (s\M'[k, i k ]M'[k', i k ,] = (s\M"{k\ i k ,]M"[k, i k ] be the state resulting from 

the consecutive actions of the equivalents of M[k, i k ] and M[k',i k ,] on s . The equation for 


the exponents, Eq.(SA-2.1,4’b), holds for every global history rt G IT 


M 


and for each 


LHS / 


of its sub-histories that could occur in any sub-interval, [t,,t] with t G [t n t F ], if and only if 
the equation, 

K\s, t ) + R?(s-[k,i k -\[k',i„], t) = R' x D (s-[k,i k \, t) + Rf (s-[k',i r ], t ), — Eq.(SA-2.1.5) 


holds for every pair, M[k, i k ] and M[k', i k . \ (with k ^ k '), in the LHS M , for every 


possible state sG5 ff before the equivalents of M[k,i k ] and M[k', i k ,] in the global 


histories in n 


M 


and at any time fG[f,,t F ].’ 


A LHS ) 


The proposition provides an essential part of our sufficient and nearly necessary set of 
conditions for the factorability of the PWA probability. Here, we prove this proposition via 
mathematical induction, similarly to the proof in subsection SA-2.2. 

We first reduce the problem into a binary one by mathematical induction regarding 
the number of local indel histories, K . As in subsection SA-2.2, let n' A ) [A,,..., N K \ denote 
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the set of maps, II 


M 


, each of whose elements is a rearrangement of K sets, of sizes 


J LHS / 


N 

N v ..., N k , into a single set of size N = ^ AT*. And re-express each map 


it G n (k) [n v ... ,N k ] as a composite map, it = p° [(cr, I Nt ) j , where 
o G ..., N ka ] and pG n ( 2 ) [A- N K , N K ] . Then, also as in subsection SA-2.2, we 


have po[{o,I NK )]\M\ = p 


<j\M'\,M[K] 


by definition, where M' = 




is the reduced LHS defined below Eq.(SA-2.2.4). Thus, if the binary version of the 


proposition SA-2.1.1, with n 


M 


replaced by n' 2 ’ [N - N K , N K ], is true for each fixed 


J LHS / 


crG n (K -\n x ,... , N k _J , we have the binary version of the factorization, Eq.(SA-2.1.4’b): 




{sW)\=(s A \ 

{(■y(i / )|=(^(v-l)|M'[^r" 1 (v)] | v=l,...,AfJ 


\N-N k 


v'=0 


-i 


(s\0)\-{s A \ 

|(i'(v)|-(i’(v'-l)|M"[(7- 1 (v')] | v'=l./V-zVy! 


— Eq.(SA-2.3.1) 


y f T ' k lK+ ' ) clT6R"\s i ,s a ,t) 
Zj J x(k,i v ) x ‘k 


(’ s °K s ^| 

{( s * H S I K -1 l^t*. i K l\ *jc — 1.^4 


for every possible it = p° |^(a, I Nt ) j with the fixed a and any pG n ( 2 ) [A-A^, N K ]. The 


first summation on the right-hand side is the left-hand side of Eq.(SA-2.1.4’b) with 
it G n' A) [A,, ..., N k ] replaced by a G n ( *“ 1 ) [./V 1 , ..., N K] j . Thus, the problem was reduced 
to that of the factorization for the global indel histories equivalent to a set of K -1 local 
indel histories. By iteratively applying the binary version of the proposition SA-2.1.1 to the 
reduced problems, we will finally obtain the fully factorized form, i.e., the right-hand side of 
Eq.(SA-2.1.4’b). 

Therefore, all we have to do is prove the binary version of the proposition SA-2.1.1. 
To do so, we will rewrite it into a more tractable form. We first pick two integers, 
i G {0,1,..., A,} and j G {0,1,..., N 2 } , and consider all sub-histories of indels composed of 


two local sub-histories, 


M[ 1,1], 


MW, /j] 


and 


M[ 2,1], 


M[2,;]].(If i=0 or ; = 0, 
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the corresponding local sub-history is considered as empty.) Each such sub-history 
corresponds to a map, p G n ( 2 ) [i, j] , and the state resulting from the action of this 
sub-history on the state s A E S 11 is represented, e.g., as: 

(s A ■ p | = (s A | M'[p~ x ■ M\p~\i + j )]. As in section R5 of the main Results and discussion, 


through the binary equivalence relations, Eq.(R5.2a-d), we can show that 
sub-history p E n ( 2 ) [i, j] is in fact equal to the state: 

<s[i;j]|-i]---^[2, i]---^[l,*n] (es"), 


for each 


Eq.(SA-2.3.2a) 


that is uniquely determined solely by the local sub-histories, 


M[ 1,1] 


M[ 1, i]J and 


M[2, 1],...,M[2, _/]], and the initial state, s A G S 11 . That is, the state s A ■ p (= s[i m , j]) is 

independent of further details of pE n ( 2) [/, /]. (Naturally, we have ,s[0,0] = s A .) Thus, the 
binary version of the proposition SA-2.1.1 is rephrased as follows. 

[ Proposition SA-2.3.1 ] 

“Eq.(SA-2.1.4’b) with K = 2 holds true for ' , jr£II l!l [iV l ,)V,] and for each of their 
sub-histories during [t n 1 1 with v / G ( t n t F ) if and only if the equation, 

R^(s[i-\-j-\],t) + Rf(s[i-j],t) = dRf(s[i-j-\],t) + dR I x D (s[i-\-j],t), 

— Eq.(SA-2.3.2b) 

holds for v i G {1,..., A,} , v j G {1,..., N 2 } , and for v tG(f ; ,t f ).” 


Here comes the proof of the proposition SA-2.3.1. First of all, we rewrite Eq. 
(SA-2.3.2b) in two different ways, as: 

<5 Rx(s[i;j], s[v,j- 1], t) = SR^isU-l-J], t ), -Eq.(SA-2.3.2b’) 

and 

6R' x d (s[vJ], s[i-VJl t) = 6Rf{sU\j~ 1], s[i-l;j-l], t) . -Eq.(SA-2.3.2b”) 

These equations collectively indicate that the increment of the exit rate due to an indel event 
in one local indel history will not be influenced by the past events in the other local history. 
Indeed, these equations can be “solved” to give: 

8Rf{s[i-,n, ) = 8Rf (s[0;y], s[0;j-l], t ), -Eq.(SA-2.3.3a) 

8Rf(s[i-n, s[i-l;j], t) = 8Rf (s[i;0], s[i-l;0],t) . -Eq.(SA-2.3.3b) 

The right-hand sides of Eq.(SA-2.3.3a) and Eq.(SA-2.3.3b) are, respectively, the increment 
purely within the 2nd local history and that purely within the 1st local history. Replacing i 


19 





with i' in Eq.(SA-2.3.3b), and summing the result over i' = , we find: 

i 

s[0\j\, t ) = (s[i';j], s[i' 

i '=1 
i 

= (s[i';0], s[z'-l;0], t) = (s[i;0], s[0;0], t). 

i '=1 

Using M®(s[(';j],i[0;j],O = ^f and ^[0,0] = / , we get 

a key equation: 

f><(44./], t) = c5i?f (s[i;0], s\ t) + <( S [0;j], /, t) . - Eq.(SA-2.3.4) 

This means that the increment of the exit rate by a sub-history p G lT 2) [z, j] is decomposed 
as the summation of two increments, each by one of the local sub-histories, 


M[1,1],...,M[1,/]] and [M[2,1],...,M[2, j] . 


Now, pick an indel history corresponding to a map tc G IT 2 '[ /V,, N 2 \, and consider 


the left-hand side of Eq.(SA-2.1.4’b) with K = 2 , i.e., ^ J (( ^ ^ dr 8R(s(v), s A , r) 


with (5(0)| = ^5 a and (s(v)| = ^s(v-l) M'[jt '(v)] for v = 1,..., N . Let i k (v) (k = l,2)be 


the number of events in the local history 


M[k, 1 M[k, N k ] that are equivalent to those 


included in the sub-history 


( 1 )] 


M'[jt *(v)] (v = 0,l,...,A). Then, we have 


z l (v) + z' 2 (v) = v , and s(v) = s[ij(v); i 2 (v )\. Thus, using Eq.(SA-2.3.4), the left-hand side of 
Eq.(SA-2.1.4’b) with K = 2 can be decomposed into the contributions from two local 
sub-histories: 


2IZZT dr a <W(O;0], * A , r) + 2IIZ»'" dT 4(v)],* A , T). 


T (jr (v+l)) 


— Eq.(SA-2.3.5) 

In each summation, i k (v) remains a particular value, e.g., l k , since v = ji([k, i k ]) until (and 
excluding) v = Jt([k, i k +1]) (for k = 1,2). Taking account of it, Eq.(SA-2.3.5) becomes: 

N i 

+ (s[0; 4 ], t) . — Eq.(SA-2.3.5’) 

! 2 =0 

From the definition of s[i; j ], Eq.(SA-2.3.2a), we can see that Eq.(SA-2.3.5’) is nothing other 
than the right-hand side of Eq.(SA-2.1.4’b) with K = 2 . The argument after Eq.(SA-2.3.4) 
applies to every history corresponding to ffGII 131 ^,#,]. Thus, we proved that 
Eq.(SA-2.1.4’b) with K = 2 holds if Eq.(SA-2.3.2b) holds. 


If 

;,-o 


t(1,»i+1) 


dr SR'xisiZ; 0], s A , r) 
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To prove the converse, we now assume that Eq.(SA-2.1.4’b) with K = 2 holds for 
the indel history corresponding to every tc G II' 2 ) [N 1 , N 2 ], as well as for each of its 
sub-histories during [t,, t ] with v / G (t n t F ). Then, taking the time-derivative of both sides 
of Eq.(SA-2.1.4’b) with K = 2 for any incomplete time-interval [/,, /1, we have, for a 
particular tc G II' 2 ’ [ N 1 , N 2 ]: 

8R x (s(v), s A , t) = (s[i,(v);0], /, t) + 8R^(s[0;i 2 (v)],s A , t ), 

using the i k (v) (k = 1, 2) defined above. Because this equation holds for any time-interval 

[t r , t] C [tj, t F ] and for every map tc G II (2) [./V 1 , N 2 ], we get exactly Eq.(SA-2.3.4) for 
v i G {0,1,..., A,} , v j G {0,1,...,A 2 } , and for v tG(f ; , t F ). Then it is easy to show 
Eq.(SA-2.3.2b). Starting with the right-hand side of Eq.(SA-2.3.2b), we find: 

8Rf(s[i-j- 1], 5 a , t)+ 8Rf (s[i-l;y], 5 A , f) 

= 0], 5 a , t) + 8R?(s[0-,j - 1], 0} + (s[i - 1;0], s A ,t) + 8Rf (s[0;/J, /, f)}. 

Swapping the 1st and 3rd terms on the right-hand side, we have: 

8Rf (s[i; j - 11, v 4 , t) + 6R" } (, s[i-l;j],s A ,t) 

= {<5/?f (s[i-l;0], s A , t) + 6R I x D (s[0-J-l],s A , t)} + {dR‘ x D (s[r,0], s A , t) + dR x (s[0;j],s A , f)} 

= 8R‘ x D (s[i-l-J-1], s A ,t) + SR' x D (s[r,j], s A ,t) . 

Adding 2R”’(s'\ t) to the leftmost and rightmost sides of the above equation, we get 
Eq.(SA-2.3.2b). Thus, the converse was proved. 

This proof of the proposition SA-2.3.1, combined with the proof above it, which 
resorts to the mathematical induction regarding K given the proposition SA-2.3.1, 
completes the proof of the key proposition SA-2.1.1. 


SA-3. Total probability of LHS equivalence class under “long indel” model 

[This section was adapted from section A6 of Appendix of [32] .] 

Here, we consider the “long indel” model [21]. Its indel rate parameters are given as follows: 


r,{x, l;s,t) 


X 

X\ end) 

whole ) 


for 1 < x < L(s) - 1, 

for x = 0, L(s) with L(s ) > 0 , 

for x = 0 with L(s ) = 0 , 


— Eq.(SA-3.1a) 


r D (x 


g,X E , X, t ) 


Pin 

ti: d) 

~ (whole) 

/Id 


for 1 < x B < x E < L(s ) -1, 
for x B = 1 or x E = L(s), 
for x B =l and x E = L(s). 


— Eq .(SA-3.lb) 
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Here L(s) is the number of sites in the sequence state s S" j. And we set l D = x E - x B + 1, 


[CO _ [CO 

jl\ end) = 2 j" t l^i , and jx\ whole) = (1-1 +1 )/i r . [NOTE: If the time reversibility is imposed 

on the model, the following equations also hold [21]: A, = (A, / q, / q 7 , k) end) = (A, / q ,)' jx\ end) 
and A' whole1 = (A, / q, / p} I whole) . In this section, however, these equations do not play any 


particular roles.] The bulk parts of the above indel rates could be related to those in the indel 
model of Dawg [26] as follows: 

A, = K MO, A, = 2^ A, , q, = A d / d (/), A d = 2 ,^ • — Eq.(SA-3.2a,b,c,d) 

Here A 7 and A /; are the total rates per site of bulk insertions and deletions, respectively, and 
f](l) and / w (/) are the length distributions of insertions and deletions, respectively. (See 
also subsection R8-1 of the main Results and discussion.) Then, the exit rate under the “long 
indel” model can be calculated as: 


R?(s, t) = (A 7 + A d )L(s) +A io ' !g [A / , {A ; ( ™°}, X D , /„(.)] 


— Eq.(SA-3.2e) 


r~ i / __ [C° _ \ _ _ __ rco 

Here, [a„ {A ; ( ^}, A d , / d (.)] - -A, + 2 A;“ rf) + A D (l D - 1) , with l D = ^ 1 f D «), is 


a constant that depends on the indel rate parameters. 

Under this “long indel” model, we will calculate the total probability of a 
local-history-set (LHS) equivalence class of (global) indel histories, conditioned on a given 
ancestral sequence, according to the prescription proposed by Miklos et al. [21]. And we will 
show that the probability calculated this way is indeed identical to that calculated via our ab 
initio theoretical formulation. 

We first briefly review the method of M ik los et al. [21]. In their method, each PWA 
is scanned from left to right, and horizontally partitioned into “chop-zones.” In the bulk of the 
PWA, a chop-zone starts immediately on the right of a preserved ancestral site (PAS) and 
ends exactly at the next PAS. The leftmost chop-zone starts at the left-end of the PWA and 
ends exactly at the first PAS if at all, or otherwise ends at the right-end of the PWA. The 
rightmost chop-zone starts immediately on the right of the rightmost PAS, if at all, and ends 
at the right-end of the PWA. It should be noted that each chop-zone contains at most one PAS, 
and that the PAS contained in the chop-zone always resides at the right-end of the zone. 

Conceptually, the conditional probability of each chop-zone (conditioned on the 
existence of the left-flanking PAS, if at all) is calculated by summing the contributions of all 
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local indel histories consistent with the homology structure [48] of the chop-zone. Then, 
according to the recipe of Miklos et al. [21], (the indel component of) the probability of a 
given PWA, conditioned on the ancestral sequence, is given as the product of the conditional 
probabilities over all chop-zones that make up the PWA. Therefore, by extension, Miklos et 
al.’s total probability of a LHS equivalence class of indel histories (consistent with the PWA) 
should be given by the product of the contributions from the local indel histories (including 
the empty histories), each confined in every chop-zone, over all chop-zones constituting the 
PWA. This is exactly what we will calculate in the following. 

Now, as in section R6 of the main Results and discussion, consider a LHS 


equivalence class, 


M 


with M = 


LHS 


M[k, 1], 


M[k,N k ] 


k=l . K 


that is consistent with a 


given PWA, a(s A ,s D ), of an ancestral sequence (s A ) and its descendant ( s D ). As near the 
bottom of section R6, we can define the regions of a(s A ,s D ) each of which potentially 
accommodates a local indel history, namely, y,, y 2 ,..., y K , as the region on the left of the 

leftmost PAS, the regions between two PASs next to each other, and the region on the right of 
the rightmost PAS. (Because the indel model at hand is space-homogeneous and has freely 
mutable flanking regions, every local indel history in each such region is independent of the 
histories outside, both physically and regarding the multiplication factor, as shown in 
Subsection R8-1 of the main Results and discussion.) Then, by appropriately distributing the 
local histories into such regions, we can provide a vector-representation of the LHS: 


4 - %._]) 


. Using these regions, each chop-zone of Miklos et al. [21] 


can be constructed by putting together a region y K with its right-flanking PAS (for 
k = 1,..., /c max -1), or by a region alone (for k = /c max ). According to Appendix A of [21], the 


contribution from the local history, M[yJ = [M^yJ,..., M N [yjj, in the chop-zone, cz(y K ) ■ 
that is associated with y K is calculated as: 


p 

1 Mik 


M[y K ],[t I ,t F ])\(s A \cz(y K )\,t I ) 




* /•••/ dr ~ ■ " dT «, e.'P j-5A», - T,) : £■:<}'. i)! 


v=0 


b-‘ s [cz()V)], 

(^K0v-i|M v [yJ|v=l,...,W,} 


— Eq.(SA-3.3) 

Here, [ cz.(y h ) ] is the portion of the ancestral state confined in the chop-zone cz(y h ), and 
(p (i (= ,v''[cz(y K )]), .... (p N are the chop-zone-confined states that the local indel history went 
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through. The expression is quite similar to each term in the perturbation expansion, Eq.(R4.7). 
Because each indel rate, r(M v [y K \, (p v _ j) , is independent of time, it was put outside of the 
multiple-time integration. And, because each “exit rate,” R'x( Mik) ((p v ', cz(y K )) (detailed later), 
is also time-independent, its time integration (in the exponent) was reduced to a simple 
multiplication by the time-lapse (r v+1 - r v ). The “exit rate” R'^ Mik) ((j)y cz(y K )) needs some 
explanation. Because each chop zone (except cz(y x )) is defined conditionally on the PAS 
that is left-flanking the zone, and because we now know that the probability is factorable (see 
subsection R8-1 in the main Results and discussion), we do not have to consider deletions that 
pierce this PAS. Neither do we have to consider indel events completely outside of the chop 
zone. Therefore, taking advantage of the space-homogeneity of the indel rates, using the 
relationship with Dawg’s indel model [26] , Eqs.(SA-3.2a,b,c,d), and letting L(</;.,) be the 
number of sites in the state </>,, (including the PAS at the right-end of the zone, if at all), the 
“exit rate” R™ Mik) (<p v ', cz(y K )) according to Miklos et al.’s definition is expressed as: 



for k = 2,..., K: max -1. It should be noted that the summation over the insertion positions ( x ) 
has the upper bound x = L((p v )~ 1, because an insertion on the immediate right of x = L((p v ) 
belongs to the right-neighboring chop-zone (cz(y ff+1 )). The summation over the indel lengths 
(/’s) is easily performed, and we get: 


R'xwk)^ c z(Y k )) = (^/ + K)L(.<t> v ) for k = 2, ..., K mdx -1. — Eq.(SA-3.4a’) 


^X(Mik) 


When k = K : max (^ 1), the expression of R^ {Mik ((j)y cz(y,,)) is almost the same as 
Eq.(SA-3.4a); the only difference is that it also needs to include the insertions right-flanking 
the PWA (i.e., with x = L((p v ) ), whose rates arc given by X\ end) in Eq.(SA-3.1a). Thus, we 
have: 



When k = 1(* K max ) , Eq.(SA-3.4a) is still useful, but we need two modifications, both 
because this chop-zone is not left-flanked by a PAS. First, insertions on the left-end (i.e., with 
x = 0 ) must have the rates X\ end) , again given in Eq.(SA-3.1a). Second, deletions “starting” 


at x = 1 must have the rates 



modifications, we have: 
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jLU tC(J tCU tLU i< tLU 

when K m . dX * 1 • Because ^ jl\ end) = J,” 2/C ^ = 2/C 2/=A = 2/C l ' & = 4> A d , we get: 


L C° „ _ 

cz(r,)) - (A, + A„)i(0,)- A, + 2,' A, 1 ”'’ + A„(/„ -1). - Eq.(SA-3.4c’) 


From Eqs.(SA-3.4a’,b,c’), we find that R^u^ifa ,’s are always affine functions of 
L(<p v ) with the slope (A, + A D ), which is the same as that of the exit rate, Rf (s, t) given 
by Eq.(SA-3.2e), for the evolution of an entire sequence under the “long indel” model. Thus, 
we have: 


^nuifafa t-v «(yj) s R x<mfafa cz(y K )) ~ Rl x D mk )(<P v -v cz(y K )) = (A, + A D )8l (M v [yj), 

— Eq.(SA-3.5) 

where dl^M v [y K fj is the change in L(cp v ) caused by the event MJy ( ]. This is exactly the 
same as the increment of the (actually time-independent) exit-rate: 
dR?(s-M v ty K ],s, t) = Rf (s-M v [yJ, t) - R?{s, t) = (A, + A D )<5/(M v [yj), 


— Eq.(SA-3.6) 

caused by the event M v [y K ] on the entire sequence. By successively applying M v ,\y K \ 
(V = l,...,v), we have: 


dRx mk) (^^cz(y K )) = dR^(s v 


S A ) 




— Eq.(SA-3.7) 


Therefore, we can rewrite the exponent in Eq.(SA-3.3) as: 


-2 (t v + i “ T v) R'xfaSfa «(yj) 

v=0 

n k 

= - ( r ^ + ! - T 0 ) R, xmk)^ cz(y K )) - 2(* V+ 1 - * v ) d Rl x D wk)(fa’ fa «(yj) 

■=o 

-T v )dR"\s v ,s A ) 


- ( T N K+l 1 ‘o)Rx(,Mik)(fa’ C Z(Yk )) 


v=0 




— Eq.(SA-3.8) 

Substituting this back into the right hand side of Eq.(SA-3.3), and comparing the result with 
Eq.(SM-2.7) in Additional file 1, we have: 
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Mik 


M[y K ],[tnt F ] IcziY^t,) 


= exp{-(f F - t,)R^ m) (s A [cz(y K )\\cz(y K ))} u P 


M[yJ,[t I ,t F ])\(s A ,t I ) 


— Eq.(SA-3.9) 

According to the method of Miklos et al. [21] , we will define the total probability of the LHS 


equivalence class, 


M 


LHS 


with M = \M[y 1 \,M[y 2 \,..., M[y K ] , as: 


- Mik 


M 


,[t I ,t F ]\\(s A ,t I ) 


LHS 


-ni:^ 


M[y K Ut n t F ] (s A [cz(yJU) 


— Eq.(SA-3.10) 

Substituting Eq.(SA-3.9) into Eq.(SA-3.10) yields: 


' Mik 


M 


’ [^/T/r] ( 5 ,tj) 


J LHS 


= exp j-a F - R xm k) (s A [cz(y K )]; cz.(y K ))\ 


M[y K ,],[t I ,t F l)\(s A ,t I ) 


— Eq.(SA-3.10’) 

Substituting Eqs.(SA-3.4a’,b,c’) into the summation in the exponent on the right hand side, 
we get: 

— Eq.(SA-3.11) 


= (A / + A D )2JE(/Myj]) + -A /+ 2 2 /= ; +A d (Z d -1) . 


On the right hand side, the expression in the braces is exactly S Long [A 7 , {k\ end) }, k D , / D (.)j in 


Eq.(SA-3.2e), and we also have ^ L(s A \cz.(y K )\) = L(s A ). Thus, the expression is further 

reduced to: 

YZ 'C-tA^ooi;«(/,))- a, + i d )l( S a ) +a“ ! [a„ a d ,/„(.)] - <(/,») 

— Eq.(SA-3.11’) 

for K: max > 1. [NOTE: In the case where /c max = 1, by the way, arguments similar to those 
leading to Eqs.(SA-3.4b,c’) reveals that R' d> {Mjk) (s A \czX)\)\\ cz(y ] )) = R^(s A , t) holds, and thus 
that Eq.(SA-3.11’) trivially holds.] Now, substituting Eq.(SA-3.11’) back into Eq.(SA-3.10’) 
while taking account of the time-independence of Rf (s A , t ) under this model, we finally 
get: 
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p 

1 Mik 


M 


> U/Tf] (■? ? h ) 


JLHS 


= expj-JJ’ dr R"’(s A , r)| Y[p p 


M[y K Ut I ,t F ])\(s A ,t I ) 


— Eq.(SA-3.10”) 

The right hand side of Eq.(SA-3.10”) is exactly that of Eq.(SM-2.12) (in Additional file 1) 


multiplied by P ([], [t,,t F ]) | (s A , t,) j = expj-J dr R'x(s a , t)| . It is nothing other than the 


total probability of the LHS equivalence class 



calculated via our ab initio theoretical 


LHS 


formulation, under the “long indcl” model, Eqs.(SA-3.1a-d) (and Eqs.(SA-3.2a-e))(2.4.7a-e). 

Actually, this equivalence between the probability via our ab initio formulation and 
that via Miklos et al.’s method [21] should hold under any indel models with factorable PWA 
probabilities described in Section R8 of the main Results and discussion, as long as the 
“chop-zones” are re-defined appropriately. Its explicit proof will be left as an exercise for the 
readers. (The key should be the decomposition of the entire exit rate into the contributions 
from (modified) chop-zones.) 
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